Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow: Add support for speculative taint flow. #17663

Merged
merged 15 commits into from
Oct 31, 2024

Conversation

aschackmull
Copy link
Contributor

@aschackmull aschackmull commented Oct 4, 2024

This adds support for speculative taint flow in the shared taint tracking library.

What is this?

This is a magic button (dial, really) that you can turn to calculate more taint flow in order to identify false negatives. So if you suspect a FN, e.g. if you're failing to find a flow for a CVE or you're facing zero results thinking that we might be missing some models, then try this!

How does it work?

Each language provides a huge candidate set of potential taint steps. The default set that I've implemented is simply any argument to any return value (plus potential side-effects on the this argument, if any) on any call for which we don't yet have an existing model or a call target within the analyzed source.
The shared library will then execute the regular taint flow, but in addition it will allow speculative flow steps drawn from this candidate set up to a specified maximum number of such edges along a given path.
It will then report flow in the usual way, and the chosen speculative edges will be visible in the path explanation with the provenance label "Speculative". (In the VSCode plugin this shows up as "(step) Speculative".)

I want to try it, show me how!

It's easy, just replace the application of the TaintTracking::Global module with TaintTracking::SpeculativeGlobal. So if you e.g. have

module MyQueryFlow = TaintTracking::Global<MyQueryConfig>;

then replace that with

int speculationLimit() { result = 10 }
module MyQueryFlow = TaintTracking::SpeculativeGlobal<MyQueryConfig, speculationLimit/0>;

The number you choose in the speculationLimit is the limit on the number of speculative steps that can be used in a path. A higher number gives more flow, but worse performance. Expect a performance degradation factor roughly equal to the chosen limit.

Testing so far, and followup work for the individual language teams

I've tested this for Java and C# with a number queries on their respective MRVA top100 with good results and reasonable performance. For the remaining languages, the candidate set of edges might need further tweaking to e.g. exclude things that happen to be calls, but which shouldn't be considered as potential taint steps. For C# I e.g. had to reduce the set to "only" include method and constructor calls, i.e. no operator nor property calls (I believe the latter is already included as read/store steps).


signature int speculationLimitSig();

private module AddSpeculativeTaintSteps<

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
module SpeculativeFlow<DataFlow::ConfigSig Config, speculationLimitSig/0 speculationLimit>
implements DataFlow::GlobalFlowSig
{
private module Config0 implements DataFlowInternal::FullStateConfigSig {

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
}
}

private module C implements DataFlowInternal::FullStateConfigSig {

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
DataFlow::StateConfigSig Config, speculationLimitSig/0 speculationLimit> implements
DataFlow::GlobalFlowSig
{
private module Config0 implements DataFlowInternal::FullStateConfigSig {

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
}
}

private module C implements DataFlowInternal::FullStateConfigSig {

Check warning

Code scanning / CodeQL

Data flow configuration module naming Warning

Modules implementing a data flow configuration should end in Config.
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a really useful tool. I've done something similar before during Swift development, it was crude (necessarily as we didn't have provenance labels at the time), but helpful nevertheless.

I gather it's not (currently) possible to combine speculative taint flow with an existing flow state in the query?

Is it possible for any of the changes to affect performance when speculation is not being used?

@aschackmull
Copy link
Contributor Author

I gather it's not (currently) possible to combine speculative taint flow with an existing flow state in the query?

No, I can happily report that combining the two very much is supported!

Is it possible for any of the changes to affect performance when speculation is not being used?

No.

@aschackmull
Copy link
Contributor Author

For Ruby, I've now made some exclusions guided by the consistency check. For review please check if those are reasonable.

Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very neat!

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a renaming suggestion.

* Constructs a global taint tracking computation that also allows a given
* maximum number of speculative taint steps.
*/
module SpeculativeFlow<DataFlow::ConfigSig Config, speculationLimitSig/0 speculationLimit>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it should be named SpeculativeGlobal instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that. I'll push a rename shortly.

* Constructs a global taint tracking computation using flow state that also
* allows a given maximum number of speculative taint steps.
*/
module SpeculativeFlowWithState<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same renaming suggestion.

@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label Oct 30, 2024
Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C# LGTM!

@aschackmull aschackmull merged commit b556590 into github:main Oct 31, 2024
60 of 61 checks passed
@aschackmull aschackmull deleted the dataflow/speculative-flow branch October 31, 2024 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants